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ABSTRACT 



Speech is coded such that it can be generated by a pulse 
excitation sequence filtered by an LPC (linear preduc- 
tive coding) filter. The sequence contains, in each of 
successive frame periods, pulses whose positions and 
amplitudes may be varied. These variables are selected 
at the coding end to reduce the error between the input 
and regenerated speech signals. The selection process 
involves derivation of an initial estimate followed by an 
iterative adjustment process in which pulses having a 
low energy contribution are tested in alternative posi- 
tions and transferred to them if a reduced error results. 

18 Claims, 9 Drawing Sheets 



INITIAL ESTIMATE OF h EXCITATION PULSE 
POSITIONS ( IPE) 



| BLOCK SOLUnON FOR THE AMPLITUDES OFIpTI 



calculation of t he we i gh te d err op energy a 

(MAYBE FROM LOOK-UP TABLE OF FUNCTION 



SELECTION OF A SINGLE PULSE OUT Of k ] 



SELECTION OF POSITION FOR POSSIBLE PULSE 
TRANSFER { DESTINATION ] 




BLOCK SOLUTION FOR THE AMPLITUDES Of THE 
NEW k PULSE SEQUENCE 



CALCULATION OF THE WEIGHTED ERROR ENERGY W 
(MAY BE FROM LOOK-UP TABLE OF FUNCTIONS) 




| NEW POSITION ANO AMPLITUDE RETAHEO ] 



12/26/2001, EAST Version: 1.02.0008 



U.S. Patent Jul.24,1990 Sheet 1 of 8 4,944,013 




111 " Q-i- n 



u: 



o 

o 
o q 

NO -J 



in 



o 
m 



o 



NOLLVaiLIV Nd31iVd 3Sind 



3 




o 

CO 



on 



< 

O LJ 
CN O 



on 



tn 3 



o 

Csl 



N0llVd3nV Nd311Vd 



12/26/2001, EAST Version: 1.02.0008 



US. Patent Jui.24,1990 sin*t2e>f8 4,944,013 



Fig. 2. 



INITIAL ESTIMATE OF k EXCITATION PULSE 

POSITIONS ( IPE) 

« 

BLOCK SOLUTIO N FOR THE AMPLITUDES OF IPE I 

. ' * 1 

CALCULATION OF THE WEIGHTED tRROR ENERGY W n I 
(MAYBE FROM LOOK-UP TABLE OF FUNCTIONS) 

' * — J 

- SELECTION OF A SIN GLE PULSE OUT OF k 

t J _ 

SELECTION OF POSITION FOR POSSIBLE PULSE 
TRANSFER (DESTINATION) 




BLOCK SOLUTION FOR THE AMPLITUDES OF THE 

NEW k PULSE SEQUENCE 

t 



CALCULATION OF THE WEIGHTED ERROR ENERGY W 
(MAY BE FROM LOOK-UP TABLE OF FUNCTIONS) 




NEW POSITION AND AMPLITUDE RETAINED 1 



12/26/2001, EAST Version: 1.02.0008 



U.S. Patent Jui.24,1990 sheet 3 of 8 4,944,013 



12dBs 



z 
(/) 

LU 

to 



11dBs 



10dBs 



B' 



J / 



j 1 1 i_ 



6 8 10 . 12 14 
PREDICTOR FILTER ORDER 



16 



Fig.5. 



lldBs 



B -B, 

y a -a \ 



00 
o 

LU 
CO 



10dBs 



9dBs- 





-B 
*A 



1.0 0.9 0.8 0.7 0.6 
NOISE SHAPING CONSTANT g 



12/26/2001, EAST Version: 1.02.0008 



US. Patent Jul 24, 1990 



Sheet 4 of 8 4,944,013 




12/26/2001, 



EAST Version: 



1 . 02 . 0008 



US. Patent Jui.24,i990 



Sheet 5 of 8 



4,944 




12/26/2001, EAST Version: 1.02.0008 




12/26/2001, EAST Version: 1.02.0008 



U.S. Patent jui.24,1990 sheet 7 of 8 4,944,013 




12/26/2001, EAST Version: 1.02.0008 



U.S. Patent Jul. 24, 1990 Sheet 8 of 8 4,944,013 



15dBs- 



12dBs - 



Fig. 11. 



16dBs 



20' 



14dBs - 



ce. 
z 

\A 
I 

a 



2 13dBs - 



15' 



* 



15 



11dBs 



-10* 



-10- 



-10- 



-10 



0.17 0.33 0.50 0.67 
PULSE RATIOS TO BE TESTED FOR TRANSFER 



12/26/2001, EAST Version: 1.02.0008 



4,944,013 

1 2 

diction coefficients, the speech signal is recovered at the 

MULTI-PULSE SPEECH CODER output of the LPC synthesis filter. 

In PIO. 1 it is assumed that a frame consists of n 

CROSS REFERENCES TO RELATED speech samples, the input speech samples being s<, . . . 

APPLICATIONS . 5 s„-i and the synthesized samples s 0 ' ... s rt -i', which 

This application is related to copending commonly can be regarded as vectors i,s'. The excitation consists 

assigned, later filed, U.S. patent application Ser. No. of W*"* of amplitude a m which are, it is assumed per- 

187,533 filed May 3, 1988, now U.S. Pat No. 4,864,621 ***** to * of n P°«W? m f tant l 

and UK patent application 8/00120. ^ frame ' but ihm me ^ a number of 

10 them (say k). Thus the excitation can be expressed as an 

BACKGROUND OF THE INVENTION n-dimensional vector a with components ao . . f a fl _ i, but 

1. Field of the Invention onlv k of ^em are non-zero. The objective is to find the 
This invention is concerned with speech coding, and ^ unknowns (k amplitudes, k pulse positions) which 

more particularly to systems in which a speech signal ^ m i n im i ze the error: 

can be generated by feeding the output of an excitation — . . 3 . 

source through a synthesis filter: The coding problem -(■-*) 

then becomes one of generating, from input speech, the . , • ^ . . - , . 

necessary excitation and filter parameter* LPC (linear T 1 ** 0 ™* * e P erce P tual T^Tl SU f~ 

predictive coding) parameters for the filter can be de- M & to * ter m0T 

rived using weU^blished techniques, and the present 20 the ^^/T u . c ?^ m thOSe partS ° f tbe 

invention is concerned with the excitation source. speech band where it is least obtrusive 

2. Description of Related Art ^ ""J °/ completion required to do this is 
Systems m which a voiced/unvoiced decision on the f°™ ous «* *f Procedure proposed by Atal and 

input speech is made to switch between a noise source ™ " follow . s: J 

and a repetitive pulse source tend to give the speech 25 <*> , Fmd the . amplitude and position of one pulse, 

output an unnatural quahty, and it has been proposed to alone, to give a minimum error. 

employ a single "muMpulse" excitation source in which < 2 > ^ P™ 1 ° f a second pulse 

a sequence of pulses is generated, no prior assumptions whlch ' m combination with this first pulse, gives a 

being made as to the nature of the sequence. It is found 30 mmunum error; the positions and amplitudes of the 

that, with this method, only a few pulses (say 6 in a 10 Pdse(s) Previously found are fixed during this 

ms frame) are sufficient for obtaining reasonable results. stage. 

See B.S. Atal and J. R. Remde: "A New Model of LPC (3) Repeat for further pulses. 

Excitation for producing Natural-sounding Speech at ™» procedure could be further refined by finally 

Low Bit Rates", Proa IEEE ICASSP, Paris, pp.614, 35 reoptimizing all the pulse amplitudes; or the amplitudes 

l$%2. may be reoptimized prior to derivation of each new 

Coding methods of this type offer considerable po- pulse, 

tential for low bit rate transmission— e.g. 9.6 to 4.8 SUMMARY OF THE INVENTION 

Kbit/s. \ 

The coder proposed by Atal and Remde operates in a 40 It will be apparent that in these procedures the results 

"trial and error feedback loop" mode in an attempt to «* nt * optimum, inter aha because the positions of all 

define an optimum excitation sequence which, when out the kth pulse are derived without regard to the 

used as an input to an LPC synthesis filter, minimizes a positions or values of the later pulses: the contribution 

weighted error function over a frame of speech. How- of each excitation pulse to the energy of synthesized 

ever, the unsolved problem of selecting an optimum 45 signal is influenced by the choice of the other pulses. In 

excitation sequence is at present the main reason for the vector terms, this can be explained by noting that the 

enormous complexity of the coder which limits its real contribution of a m is a m f m where f OT is the LPC filter's 

time operation. impulse response vector displaced by m, and that the set 

The excitation signal in multipulse LPC is approxi- of vectors F m are not, in general, orthogonal, (where 

mated by a sequence of pulses located at non-unifonnly 50 ni=0 ... 11 — 1). 

spaced time intervals. It is the task of the analysis by The present invention offers a method of deriving 

synthesis process to define the optimum locations and pulse parameters which, while still not optimum, is 

amplitudes of the excitation pulses. believed to represent an improvement. 

In operation, the input speech signal is divided into According to one aspect of the present invention we 

frames of samples, and a conventional analysis is per- 55 provide a method of speech coding comprising: 

formed to define the filter coefficients for each frame. It receiving speech samples; 

is then necessary to derive a suitable multipulse excita- processing the speech samples to derive parameters 

tion sequence for each frame. The algorithm proposed representing a synthesis filter response; 

by Atal and Remde forms a multipulse sequence which, deriving, from the parameters and the speech sam- 

when used to excite the LPC synthesis filter minimizes 60 pies, pulse position and amplitude information de- 

(that is, within the constraints imposed by the algo- fining an excitation consisting, within each of sue- 

rithm) a mean-squared weighted error derived from the cessive time frames corresponding to a plurality of 

difference between the synthesized and original speech. speech samples, of a pulse sequence containing a 

This is illustrated schematically in FIG. 1. The positions smaller plurality of pulses, the pulse amplitudes and 

and amplitudes of the excitation pulses are encoded and 65 positions being controlled so as to reduce an error 

transmitted together with the digitized values of the signal obtained by comparing the speech samples 

LPC filter coefficients. At the receiver, given the de- with the response of the synthesis filter to the exci- 

coded values of the multipulse excitation and the pre- tation; 
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wherein the pulse position and amplitude information discussed above proposed by other workers. However, 

is derived by: a simplified procedure is preferred, on the basis that the 

(1) deriving an initial estimate of the positions and reduction in accuracy can be more than compensated 
amplitudes of the pulses, and for by the pulse transfer stage, and that the overall 

(2) carrying out an iterative adjustment process in 5 computational requirement can be kept much the same, 
which individual pulses are selected and their posi- One possibility is to find the maxima of the cross 
tions and amplitudes reassessed. correlation between the input speech and the LPC fil- 

BRIEF DESCRIPTION OF THE DRAWINGS ter * s inipwse response. However, as voiced speech re- 

. ,. - , .„ , suits in a smooth crosscorrelation which offers a limited 

Sora embodiments of the invention will now be 10 number 0 f local maxima, a multistage sequential search 

described, by way of example, with reference to the ^ gotithm b pre ferred. 

accompanying drawings, in which; * n * 

FIG. 1 is a block diagram illustrating the coding 
process; 

FIG. 2 is a brief flowchart of the algorithm used in 15 y = "j 1 a m f m + m ® 

the exemplary embodiment of the present invention; m=0 

FIGS. 3a and 3b illustrate the operation of the pulse 

transfer iteration; Where m is the filter's memory from previously synthe- 

FIGS. 4 to 7 are graphs illustrating the signal-to-noise sized frames, 

ratios that may be obtained. 20 Since only k values of the excitation are non-zero Eq. 

FIG. 8 is a graph of energy gain function against 2 can be written as: 
pulse energy; and 

FIGS. 9 to 11 are graphs illustrating results obtained k (3) 

using the function illustrated in FIG. 8. ✓ = 2 ^ ap/pi + m 

DESCRIPTION OF THE PREFERRED 25 

EMBODIMENT(S) where p/ is the location index. Consider that the n nor- 

It has already been explained that the objective is to malized vectors 
find, for each time frame, the parameters of the k non- 
zero pulses of the desired excitation a. For convenience 30 b fm 
the excitation is redefined in terms of a k-dimensional m = Wfm II 
vector c containing the amplitude values ci to cjt, and 

pulse positions p (i= 1 . . . k) which indicate where these define a basis of unit vectors in an n-dimensional space, 

pulses occur in the n-dimensional vector. The flow Eq 3 shows that' the synthesized speech vector can be 

chart of the algorithm used in an exemplary embodi- 35 thought of as the sum of k n-dimensional vectors a^- 

ment of the invention is shown in FIG. 2. An initial | \f pI \ | i pi which are obtained by analysing ? in a k 

position estimate of the pulse positions p /f i= 1,2, . . . k, dimensional subspace defined by the Ei% i= 1,2, . . . k 

is first determined. A block solution for the optimum unit vectors. 

amplitudes then defines the initial k-pulse excitation At each stage of the search the location of an o addi- 
sequence and a weighted error energy W, is obtained 40 tional exc itation pulse is determined by first obtaining 
from the difference between the synthesized and the ^ ^ orthogonal projections q/,i=0, 1, ... n- 1 of an 
input speech. . input vector s^ onto the n axes of the analysis space and 
The selection of only one pulse follows whose posi- ^ ^ ctin ^ projection ^ mQX with the maxirnum 
tion p m might be altered within the analysis frame. The magnitude . These projections correspond to the cross- 
algorithm decides on a new possible location for this 45 corTelation between s^and the basis vectors b,, 1=0,1. . 
2££? SnS^"^ 18 v d . etCnnmC ^ • • 1. The vector s, is updated at each stage of the 

T^Z'T^V^X £SH wShTe P— * WJ- ***** J i^al 

nrAvi/Mic a»rit,^ n ™„«,^« tu- ™„ „ • „ 0 value Srf is the input speech vector s mmus the filter 

previous excitation sequence. The new location is re- — 

tained only if the corresponding weighted error energy 50 m ™ ory . m * . , . . , „ . . 

W is smaller than W,obtained from the previous excitk- ™* * implemented without the need 

tion signal. to fi^ds^pnorto the calculation of all the cross correla- 

The search process continues by selecting again one * on values 1 1 *l I • at each sta se of the process. Instead, 

pulse out of the k available pulses and altering its posi- <fo l=0>1 • • • 1, are defined directly using the linear- 

tion, while the above procedure is repeated. The final 55 itv property of projection. Thus at the jth stage of the 

k-pulse sequence is established when all the available process q,<j) is formed by subtracting the projection of 

destination positions within the analysis frame have Qmo*(j— 1) onto the n axes, from qiQ—\) i.e. 
been considered for the possibility of a single pulse 

transfer. <?r0) = «tf - 0 - Proj [wtf - 01. (4) 

The search algorithm which defines (i) the location of 60 / - 0,1 ...» - l 

a pulse suitable for transfer and (ii) its destination, is of _ _ 

importance in the convergence of the method towards a However, as qmax 8 1 \<imax\ | 5/ , where 5/ is the unit 

minimum weighted error. Different search algorithms basis vector ofthe axis where q max lies, the orthogonal 

for pulse selection and transfer will be considered be- projections of qmax onto the n axes are: 
low. 65 

Firstly, we consider the initial estimate step. In prin- Proj [ ?JWa ],« | Umax 1 1 (5) 

ciple, any of a number of procedures could be used — in- / « o.i f . . . n - i 
eluding the multistage sequential search procedures 
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Note that CO the above a dot products B//=b i. b/,i= 0,1, _ 

. • . n— I, are normalized autocovariance estimates of e««-m-s£=x-sE (*) 

the LPC filter's impulse response, and (u) k.n au- _ _ _ „ . . 

tocovariance estimates are needed for each analysis Where x=s-m has an energy e*e which can be mini, 
frame. 5 inked using Least Squares and the optimum vector c is 

Thus during the first stage of the method, n cross-cor- 8* ven b y: 
relation values | |q,| |, i«<U, . n-j are calculated c^S 7 ^-^ (10) 

between the input speech vector s and b/. The maximum 

value | li^l | is then detected to define the location ^ viousi mcntioned the error however has a flat 
and amplitude of the first excitation pulse. In the next 10 charactcristic md h not a good measure of thc 

stage the n values 1 1 w| | B/* i=<U . . . n— I are p^ept^ difference between the original and the syn- 
subtracted from the previously found cross correlation speech signals . In general due to the relatively 

values and a new maximum value is determined i which high concentration of speech energy in fonnant regions, 

provides the location and ^ x errors can ^ tolerat ed in the formant regions 

This continues until the locations of the k excitation ^ m ^ fegions betW€ea fonnants. The shape of the 

pulses are found. ^ tl error spectrum is therefore modified using a linear shap- 

The complexity of the algorithm can be considerably . 

reduced by approximating the normalized autocovari- whence the weighted error 5 is given by: 
ance estimates of the LPC filter's impulse response B// 

with normalized autocorrelation estimates R/; whose 20 u= PS- P55«y-ZJE (li) 
value depends only on the I— i difference, viz. 

R^/=B6,|/-t|. In this case only n autocorrelation esti- where y and D correspond to the "transformed" by V 

mates are calculated for each analysis frame compared signal x and convolution matrix S respectively. An 

to the k.n previously required. The performance of this error is therefore defined in terms of both the shaping 

simplified algorithm, in accurately locating the extita- filter V and the excitation sequence_h required to pro- 

tion pulse positions, is reduced when compared to that duce the perceptually shaped error u. The actual error 

of the original method. The above approximation how- ^ still of course x— SE and is designated e\ whence 
ever makes the simplified method very satisfactory in 

providing the initial position estimates. „ e'-K~ l u 02) 

The initial position estimate may be modified to take _ 

account of a perceptual weighting — in which case the Furthermore u^u is minimized when 
filter coefficients F m (and hence the normalised vectors 

b) would be replaced by those corresponding to the E=(i> r x>) iD^y (13) 

combined filter response; and the signal for analysis is . 

also modified. m wnicn case toe spectrum of u is flat and its energy is 

The pulse positions having been determined, the am- ^ _^ r „ 4) 

plitudes may then be derived. Once a set of k pulse u u " y y y 

positions is given a "Mock" approach is used to define -perceptually optimum" excitation sequence 

the pulse amplitudes. The method is designed tc > mini- ^ £ minimizing the energy of the error 

mize the energy of a weighted error signal formed from - * * signal x and the 

the difference between he input a and the synthesizeds ^ > been mod V ied according to 

speech vectors, s; is obtained^ a the output of the LPC £ sha pmg filter V(z). Since the niinimization is 

synthesis filter F{z)= l/[l-^)] as: performed in a modified n-dimensional space, the actual 

-, =i? - + - (6) +5 error energy e' 7 ? (seeFIG. 1) is expected to be larger 

than the error energy e 7 ^ found using c from Eq. 10. 

where R is the nXn lower triangular convolution ma- fflter v ( z ) te set t0: 

■ l ™ (15) 

50 

Where g controls the degree of shaping applied on the 
flat spectrum of u (Eq. 12). When g= 1 there is no shap- 
ing while when g=0 then V(z)= [1—^)1 and f»U spec- 
i ... r 0 j tral shaping is applied. The choice of g is not too critical 

55 in the performance of the system and a typical value of 
r* is the kth value of the F(z) filter's impulse response, a 0.9 is used. 

is the vector containing the n values of the excitation Notice from Eq. 11 that V deemphasizes the formant 
and m is the filter's memory from the previously synthe- regions of the input signal x and that the modified filter 
sized frames. T ( 2 ) (whose convolution matrix is V R=T) has a trans- 

Since the excitation vector a consists of k pulses and «■ &r function 1/tl-P^z/g)]^ Also an interesting case 
n-k zeros, Eq 6 can be written as: arises for g=0 where y = V x becomes the LPC residual 

and D^D is a unit matrix. The optimum k pulse excita- 
s=s£+m (8) tion sequence consists in this case (see Eq. 13), of the k 

most significant in amplitude samples of the LPC resid- 
where S is now an Xk convolution matrix formed from 65 uaL _ 

the columns of R which correspond to the k pulse loca- The pulse amplitudes h can be efficiently calculated 
tions, and c contains the k unknown pulse amplitudes. using Eq. 13 by forming the n-valued cross-correlation 
The error vector Cr^=T 7 y between the transformed input signal y and 



R = 



0) 
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the impulse response of T(z) only once per analysis 
frame. Note here that T is the full nxn matrix as opposed 
to the nxk matrix D. Czy can be conveniently obtained 
at the output of the modified synthesis filter whose 
input is the tune reversed signal y. Thus instead of cal- 5 
culating o the k cross-correlation values DTy, every 
time Eq. 13 is solved for a particular set of pulse posi- 
tions, the algorithm selects from C*y the values which 
correspond to the position of the excitation pulses and 
in this way the computational complexity is reduced. 10 

Another simplification results from the fact that only 
one pulse position, out of k, is changed when a different 
set of positions is tried. As a result the symmetric matrix 
DTD found in Eq. 13 only changes in one row and one 
column every time the configuration of the pulses is 15 
altered. Thus given the initial estimate, the amplitudes h 
for each of the following multipulse configurations can 
be efficiently calculated with approximately k 2 multipli- 
cations compared to the k 3 multiplications otherwise 
required. 20 

Finally an approximation is introduced to further 
reduce the computational burden of forming the D 7 !) 
matrix for each set of pulse positions. 

D^D is formed from estimates of the autocovariance 
o of the T(z) filter's impulse response. These estimates 25 
are also elements of a larger nxn T*T matrix. The 
method is considerably simplified by making T?T Toe- 
plitz. In this case there are only n different elements in 
T*T which can be used to define D 7 !) for any configu- 
ration of excitation pulses. These elements need only to 30 
be determined once per analysis frame by feeding 
through T(z) its reversed in time impulse response. In 
practice, though, it is more efficient to carry out updat- 
ing (as opposed to recalculation) processes on the in- 
verse matrix (D 7 !))- l . 35 

Consider now the pulse transfer stage. The conver- 
gence of the proposed scheme towards a minimum 
weighted error depends on the pulse selection and 
transfer procedures employed to define various k-pulse 
excitation sequences. Once the initial excitation estimate 40 
has been determined, a pulse is selected for possible 
transfer to another position within the analysis frame 
(see FIG. 2). 

The criteria for this selection — and for selecting its 
destination — may vary. In the examples which follow, 45 
the destination positions are, for convenience, examined 
sequentially starting at one end of the frame. Clearly, 
other sequences would be possible. 
_ The pulse selection procedure employs the term 
hHD^y of Eq. 14, which represents the energy of the 50 
synthesised signal and is the sum of k energy terms. 
Each of these terms, which is the product of an excita- 
tion pulse amplitude with the corresponding element of 
the cross correlation vector Cj> represents the energy 
contribution of the pulse towards the total energy of the 55 
synthesized signal. The pulse with the smallest energy 
contribution is considered as the most likely one to be 
located in the wrong position and it is therefore selected 
for possible transfer to another position. 

The procedure adopted is as follows: 60 

a. Choose the "lowest energy pulse" using the above 
criterion. 

b. define a new excitation vector in which the pulse 
positions are as before except that the chosen pulse 

is deleted and replaced by one at position w (w is 65 
initially 1). 

c. recalculate the amplitudes for the pulses, as de- 
scribed above. 



8 

d. compare the new weighted error with the refer- 
ence error 

—if the new error is not lower, increase w by one 
and return to step b to try the next position. 
Repetition of step a is not necessary at this point 
since the 'lowest energy" pulse is unchanged. 
— if the error is- lower, retain the new position, 
make the new error the reference, increment w, 
and return to step a to identify which pulse is 
now the "lowest energy" pulse. 
This process continues until w reaches n — i.e. all 
possible "destination" positions have been tried. During 
the process, of course, the previous position of the pulse 
being tested, and positions already containing a pulse 
are not tested— i.e. w is 'skipped* over those positions. 
As an extension of this, different selection criteria may 
be employed in (dependence on whether the "destina- 
tion" in question is a pulse position adjacent an existing 
pulse., i.e. each pulse at position j defines a region from 
j — k to j + X and when w lies within a region a different 
criterion is used. For example: 

A. outside regions— "lowest energy" pulse selected 
within regions — no pulse selected thus when w 

reaches j — X. it is automatically incremented to 
j+X+1 

B. outside regions— "lowest energy" pulse selected 
within region — the pulse defining the region is 

selected 

C. outside regions— no pulse selected 

within region— the pulse defining the region is 
selected }, 

FIGS. 3a and 3b illustrate the successive pulse posi- 
tion patterns examined when the algorithm employs the 
B scheme. In FIG. 3a an analysis frame of n=180 sam- 
ples is used while n= 120 in FIG. 3b. In both cases the 
number of pulses k, is equal to n/10. 

In practice, the coding method might be implemented 
using a suitably programmed digital computer. More 
preferably, however, a digital signal processing (DSP) 
chip — which is essentially a dedicated microprocessor 
employing a fast hardware multiplier— might be em- 
ployed. 

The coding method discussed in detail above might 
conveniently be summarised as follows: For each frame 

I. Evaluate the LPC filter coefficients, using the max- 
imum entropy method. 

II (a), find the impulse response of the weighted filter, 
(this gives us the convolution matrix T=VR) 

(b) . find the autocorrelation of the weighted filter's 
impulse response 

(c) . subtract the memory contribution and weight 
the results; i.e. find y= Kx= K(i— m) 

(d) . find the cross-correlation of the weighted sig- 
nal and the weighted impulse response 

III. make the initial estimate, by— starting with j=l 
and q/(l) being the cross-correlation values already 
found 

(a) , find the largest of ||q,(j)|| which is 
I |qmax(DI I « 1 1 Qi<i> I Jj noting the value of 1 

(b) . find the n values | |qm«(j)l |_Rff 

(c) . subtract these from ||q,{j)|| to give 
■I 140+1)1 1 

(d) . repeat steps (a) to (d) until k values of 1 — 
which are the derived pulse positions — have 
been found. 

IV. Find the amplitudes by 
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(a) , finding Ci^—D^y (obtained from the k pulse B/; while the first three systems approximate these esti- 
positions simply by selecting the relevant col- mates with the auto correlation values R//. 

umns of the cross-correlation from II(d)above) The method proposed here, in essence lifts the pulse 

(b) . find the amplitudes h using the steps defined by location search restrictions found in the methods re- 
equation (13); (D 7 !))- 1 is initially calculated and 3 f errcc j to ear ii er . The error to be inmimized is always 
then updated calculated for a set of k pulses, in a way similar to the 

(c) . finding the k energy E Cpy amplitude optimization technique previously encoun- 

V, Carry out the pulse position adjustment by-start- tered> md nQ invoked regarding pulse 

M ^ tn l Weal: , . . ...... . amplitudes or locations. The algorithm commences 

(a) , checking whether w is within^* of an existing io ^ * m ^ of ^ k^nsional subspace 
pulse, and if not (assuming option A) omitting sequentially the subspace, and 
sibstiS^TuL^t 2ow rgy therefore the pulse positions, in search of the optimum 

(b) . repeat steps IV to find the new amplitudes and ^ ution - Pulse ampUtudes are calculated with a 
error 15 olock method which projects the input signal s onto 

(c) . advance w to the next available position— if cach subspace under consideration. 

none is available, proceed to step (f) proposed system has the potential to out-perform 

(d) . if the error is not lower than the reference conventional multipulse excitation systems systems and 
error, return to step Va i* 5 performance depends on the search algorithms em- 

(e) . if the error is lower, make the new error the 20 ployed to modify, sequentially the k dimensional sub- 
reference error, retain the new amplitude and space under consideration. 

position and energy terms and return to step (a) A further modification of iterative adjustment pro- 

(f) . calculate the memory contribution for the next cess and more especially the criteria for selection of 
frame pulses whose positions are to be reassessed will now be 

VI. Encode the following information for transmis- 25 considered The option to be discussed is a modification 
sion: of scheme (C) referred to above. 

(a) , the filter coefficients The aim is to reduce the computational requirements 

(b) . the k pulse positions 0 f the multipulse LPC algorithm described, without 

(c) . the k pulse amplitudes. reducing the subjective and SNR performance of the 
VIL Upon reception of this information, the decoder 30 system. In scheme C, given the initial excitation esti- 

(a) . sets the LPC filter coefficients mate, each excitation pulse defines a± A region and only 

(b) . generates an excitation pulse sequence having k t he possibility of transferring a pulse to a location 
pulses whose positions and amplitudes are as within its own region is examined by the algorithm, 
defined by the transmitted data. Thus each of the k initial excitation pulses is tested for 

A typical set of parameters for a coder are as follows 35 transfer into one of ±\ neighbouring locations. 

Bandwidth 3.4 KHz The complexity of the algorithm implementing 

Sampling rate 8000 per second scheme C is, it is proposed, reduced by testing only ki 

r rSi * jt 11 c pv\se$ for possible transfer where ki <k. The question 

LPC update penod 22.5 ms ^ afises of how t0 sdect) for transfer ki out 



Frame size (n) 120 samples 40 



of the k initial excitation pulses. 



Spectral shaping factor <g) 0.9 The proposed pulse selection procedure is based on 

No of pulses per frame (k) 12 (800 pulses/sec) . . ■ *. . r 

Results obtained by computer simulation using sen- ^ Rowing two > requ* ements: 
tences of both male and female speech, are illustrated in © the k. pulses tobe tested are associated with a high 
FIGS. 4 to 7. Except where otiierwise indicated, the 45 P">babuity of being transferred to another location 
parameters are as stated above. In FIG. 4, segmented ,.., wl ? mn ^ = fc ^. re & lon - . . . 
signal-to-noise ratio, averaged over 3 sec of speech, for W given that an initial excitation pulse is to be trans- 
pulse transfer options A and B, is shown for LPC pre- &™d to another location, this transfer results in a 
diction order varying from 6 to 16. considerable change m the energy of the synthe- 
InFIG. 5 the noise shaping constant g was varied. 0.9 50 ***** s ^ nal in approximating the energy of the 
appears close to optimum. FIG. 6 shows the variation of m V ut signal. 

SNR with frame size (pulse rate remaining constant) Recall (equation 14) that the energy of the synthe- 
The small increase in SEG-SNR can be attributed to the sized signal is E 7 !) 7 ^ which is the jum of k energy 
improved autocorrelation estimates R/,- obtained when terms, h/d p , y and D=[dp 1 , d/^, . . . , dpjj. Each of these 
larger analysis frames are used. It is also evident, from 55 terms represents the energy contribution of an excita- 
FIG. 6, that the proposed algorithms operate satisfacto- tion pulse towards the total energy of the synthesized 
rily with small analysis frames which lead to computa- signal. Using the (approximate) assumption that the 
tionally efficient implementations. FIG. 7 compares the energy contribution of each pulse is independent of the 
SEG-SNR performance of five multipulse excitation positions/amplitudes of the remaining excitation pulses, 
algorithms for a range of pulse rates. Curve 0 gives the 60 one can then relate the above two requirements to a 
performance of the simplified algorithm used to form normalized energy measure E/ associated with an exci- 
thc Initial Position Estimate for the system A and B, tation pulse i: 
whose performance curves are A and B. Curve Q corre- 
sponds to the algorithm used by Atal and Remde, while T ^ 
curve S shows the performance of that algorithm when 65 £ = hid pP 
amplitude optimization is applied every time a new ' = k T 
pulse is added to the excitation sequence. Note that the j- 1 
latter two systems employ the autocovariance estimates 
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In particular, given that Ef lies within the small energy the method reduces to selecting only those kj pulses 
interval E*s the probability of pulse relocation p(E*) is, with the largest values of E. In some circumstances it 

may be appropriate to use E' instead of E as the hori- 
zontal axis for the plot, and indeed this is in fact so for 
mjc (17) 5 FIG. 8. (E' is given by equation 16 with h' and d' substi- 

p(£ > " ~ tuted for h and d). 

FIG. 9 shows the signal-to-noise ratio performance 
where njc is the number of pulses with energy values against multiplications required per input sample, for 
within the Ejc interval and only of these pulses are the following four multistage sequential search algo- 
actually relocated by the search procedure. *0 rithms: 

In the second requirement the energy change Q, A: ATAJL's scheme with amplitude optimization at 
which results from relocating a pulse from the p,- loca- each stage 

tion to p,-', is given by Z: ATAL's scheme without amplitude optimization 

at each stage 

hid iT y - htfy (18) 15 X: INITIAL ESTIMATE algorithm with amplitude 

q _ p l optimization at each stage. 

| hjci T K: INITIAL ESTIMATE algorithm without amph- 

J~ l tude optimization at each stage. 

as well as for the proposed block sequential algorithm 
An average energy change per transfered pulse is now 20 ±e simplified scheme C of pulse selection and 
formed as destination when allowing 1/6, 2/6, 3/6 and 4/6 of the 

. initial pulses to be tested for transfer. 
o a xE K ) mm ? p q ^ ^ ne Srcpk shows average segmental SNR obtained at 

ss y=-oo Qk * a constant pulse rate with different multipulse algo- 

wh rithms (solid line), for a particular speech sentence The 

w CTC horizontal axis indicates the algorithm complexity in 

nQjQ number of multiplications per sample. The intermittent 

PQxj = mK line shows the SNR performance of each algorithm 

3 q when its complexity is varied by changing the pulse 
mjc is the number of pulses relocated by the search rate - 

procedure, whose energy value lies within the E K inter- Note tflat toe complexity of the proposed algorithm is 
val, while ngjc,/ is the number of those of the mjc pulses considerably reduced for small transfer pulse ratios 
whose relocation resulted in an energy change value Q while the SNR performance is almost unaffected, 
lying within the small energy interval E/. 35 FIG. 10 shows for the above system, the number of 

Using p(E*) and QovCE^ an Energy Gain Function multiplications required per input sample versus excita- 
G e is thus defined as tion pulses per second. 

FIG. 11 illustrates the SNR performance of the pro- 
(20) posed system for different values of pulse ratios to be 
Q, « p(E K ) Q a AE K ) tested for transfer. Results are shown for 800 pulses/sec 

j ' (10 percent, 1200 pulses/sec (15 percent) and 1600 pul- 

= HTjJ- co nQkJ Q ses/sec (20 percent). Note that the solid line in FIG. 11 

corresponds to performance of the Initial Estimate algo- 
and represents the average energy change per pulse, rithm ™ ith amplitude optimization at each stage of the 
which results from the relocated pulses, whose normal- 45 search process, 
ized energy E fells within the E K interval. We c Jaimi 

Clearly then, the value of the Energy Gain Function 1* A method of speech coding comprising: 
G e should be larger for the ki pulses, selected to be receiving speech samples; 

tested for possible transfer, than for the reniaining processing the speech samples to derive parameters 
k-ki pulses in the initial excitation estimate. 50 representing a response of a synthesis filter; 

In practice, a plot of Energy Gain Function against deriving, from the parameters and the speech sam- 
normalized Energy E can be obtained— e.g. from sev- pte. pulse position and amplitude information de- 

eral seconds of male and female speech— while a piece- fining an excitation consisting, within each of suc- 

wise linear representation is a convenient simplification cessive time frames corresponding to a plurality n 

of this function. The problem of selecting for possible 55 of said speech samples, of a pulse sequence contain- 
relocation ki out of k pulses can now be solved using ing a smaller plurality k of pulses; 

this data. That is, given the initial sequence of excitation wherein the pulse position and amplitude information 
pulses, the normalized energy Ef is measured for each of the k pulses is derived by: 

pulse and the corresponding G € values are found from (1) deriving an initial estimate of the positions and 

the plot— e.g. as a stored look-up table or computed 60 amplitudes of the k pulses, and 

criteria based on the piece wise linear approximation. (2) carrying out an iterative adjustment process by: 

Those ki pulses with the largest G e values are then (a) selecting individual ones of the k pulses ac- 

selected and tested for relocation. cording to predetermined criteria, and 

FIG. 8 shows a typical G e v. E plot, along with a (b) substituting for each such selected pulse a 

piecewise linear approximation. It will be noted that if, 65 pulse in an alternative position whenever a 

as shown, the curve is monotonic (which is not always computed error signal is thereby reduced, said 

the case) then the largest G e always corresponds to the error signal being obtained by comparing 

largest E. In this instance the conversion is unnecessary: speech samples with the response of a filter 
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having said parameters to an excitation which 
includes said selected pulse and others of said 
pulses, said substituted alternative position 
thereby being obtained as a function of the 
position and amplitudes of said other pulses. 5 

2. A method according to claim 1 in which said initial 
estimate of the pulse positions is made by cross-correlat- 
ing a set of n input speech sample amplitudes occurring 
during each frame with each of a set of normalized 
vectors corresponding to time-shifted impulse responses 10 
of the filter and selecting the relative positions of the k 
largest values of such cross-correlation as the k pulse 
positions used in said initial estimate. 

3. A method according to claim 1 in which said initial 
estimate of the k pulse positions is made by cross-cor- 15 
relating a set of n input speech sample amplitudes dur- 
ing each frame and each of a set of normalized vectors 
corresponding to time-shifted impulse responses of the 
filter and selecting the relative position of the largest 

value of such cross-correlation as the first pulse position 20 ous frames; D and y being adjusted by a perceptual 



energy change resulting from relocation of a pulse hav- 
ing an energy within that interval 

11. A method according to claim 11 in which the 
energy gain function for each pulse is obtained from a 
lookup table having entries for energy intervals and. 
corresponding energy gain functions, the lookup table 
having been empirically derived from a training se- 
quence of speech. 

12. A method according to claim 1, 2 or 3 in which 
the pulse amplitudes, in the initial estimate step or dur- 
ing the iterative adjustment process, are calculated 
using the relation 

where E is a vector consisting of k amplitudes, D is a set 
of time shifted filter impulse responses corresponding to 
the pulse positions, and y is a difference between the 
input speech vector and the filter response from previ- 



25 



in said initial estimate; with successive k— 1 pulse posi- 
tions corresponding to the position of a largest value of 
adjusted further cross-correlations between an input 
speech vector and the said normalized vectors, the fur- 
ther cross-correlations for each successive pulse posi- 
tion selection having been adjusted by subtraction of 
values representing orthogonal projections of vector 
representations of earlier selected pulses onto axes rep- 
resented by corresponding normalized vectors. 

4. A method according to claim 1, 2 or 3 in which the 30 
iterative adjustment process is effected by repeated 
selection of one of the pulses according to a predeter- 
mined criterion, and substituting for that pulse a pulse in 
an alternative position only if such substitution results in 
a reduction in the said error, the pulse amplitudes being 35 
again derived following each such substitution. 

5. A method according to claim 4 in which the prede- 
termined criterion for pulse selection is effected by 
deriving k energy terms, each of which is the product of 
a pulse amplitude and the corresponding term of the 40 
vector formed by multiplying a convolution matrix of 
the filter and the difference between said input speech 
vector and a filter response from previous frames, each 
being adjusted by any perceptual weighting factor. 

6. A method according to claim 4 in which the alter- 45 
native positions are selected successively in sequence 
from n available positions, such that no alternative posi- 
tion is tested for substitution more than once. 

7. A method according to claim 6 in which zones are 
defined as including a predetermined number of poten- 
tial alternative positions adjacent a position already 
occupied by a pulse, and different criteria for selection 
of a pulse to be substituted are employed dependent on 
whether a selected alternative position is within or out- 
side the said zones. 

8. A method according to claim 7 in which whenever 
the selected alternative position falls within a zone, no 
pulse is selected for substitution. 

9. A method according to claim 7 in which whenever 



weighting. 

13. An apparatus for speech coding comprising: 
means for receiving speech samples; 

means for processing the speech samples to derive 
parameters representing a response of a synthesis 
filter, 

means for deriving, from the parameters and the 
speech samples, pulse position and amplitude infor- 
mation defining an excitation consisting, within 
each of successive time frames corresponding to a 
plurality n of said speech samples, of a pulse se- 
quence containing a smaller plurality k of pulses; 
wherein the means for deriving pulse position and 
amplitude information of the k pulses includes; 

(1) further means for deriving an initial estimate of 
the positions and amplitudes of the k pulses, and 

(2) means for carrying out an iterative adjustment 
process by: 

(a) selecting individual ones of the k pulses ac- 
cording to predetermined criteria, and 

(b) substituting for each such selected pulse a 
pulse in an alternative position whenever a 
computed error signal is thereby reduced, said 
error signal being obtained by means for com- 
paring speech samples with the response of a 
filter having said parameters to an excitation 
which includes said selected pulse and others 
of said pulses, said substituted alternative posi- 
tion thereby being obtained as a function of 
the position and amplitudes of said other 
pulses. 

14. An apparatus according to claim 13 in which said 
initial estimate of the pulse positions is made by means 
for cross-correlating a set of n input speech sample 

55 amplitudes occurring during each frame with each of a 
set of normalized vectors corresponding to time-shifted 
impulse responses of the filter and means for selecting 
the relative positions of the k largest values of such 



50 



cross-correlation as the k pulse positions used in said 
a next available alternative position in sequence is 60 initial estimate, 
within one of the zones a pulse defining that zone is 15. An apparatus according to claim 13 in which said 
selected for possible substitution. initial estimate of the k pulse positions is made by means 

10. A method according to claim 6 in which only for cross-correlating a set of n input speech sample 
certain pulses are selected for possible substitution, amplitudes during the frame and each of a set of normal- 
those pulses being those whose normalized energy has a 65 ized vectors corresponding to time-shifted impulse re- 
larger energy gain function than the unselected pulses, sponses of the filter and means for selecting the relative 
the energy gain function for pulses having energies position of the largest value of such cross-correlation as 
lying within a given energy interval being an average the first pulse position in said initial estimate; with suc- 
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cessive k— 1 pulse positions corresponding to the posi- 
tion of a largest value of adjusted further cross-correla- 
tions between an input speech vector and the said nor- 
malized vectors, the further cross-correlations for each 
successive pulse position selection having been adjusted 
by means for subtracting values representing orthogo- 
nal projections of vector representations of earlier se- 
lected pulses onto axes represented by corresponding 
normalized vectors. 

16. Apparatus according to claim 13, 14 or 15 in 
which the iterative adjustment process is effected by 
repeated selection of one of the k pulses according to a 
predetermined criterion, and further including means 
for substituting for said selected pulse a pulse in an 
alternative position only if such substitution results in a 



reduction in the said error signal, the pulse amplitudes 
being again derived following each such substitution. 

17. Apparatus according to claim 16 in which the 
predetermined criterion for pulse selection is effected 

5 by deriving k energy terms, each of which is the prod- 
uct of a pulse amplitude and the corresponding term of 
the vector formed by means for multiplying a convolu- 
tion matrix of the filter and the difference between said 
input speech vector and a filter response from previous 

10 frames, each being adjusted by any perceptual 
weighting factor. 

18. Apparatus according to claim 16 in which the 
alternative positions are selected successively in se- 
quence from the available positions, such that no alter- 

15 native position is tested for substitution more than once. 
»♦***■ 
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